All Questions

2 questions

2votes

2answers

301views

Advantage computed the wrong way?

Here is the code written by Maxim Lapan. I am reading his book (Deep Reinforcement Learning Hands-on). I have seen a line in his code which is really weird. In the accumulation of the policy gradient $...

jgauth

asked May 14, 2020 at 21:47

1vote

1answer

257views

Once the environments are vectorized, how do I have to gather immediate experiences for the agent?

My main purpose right now is to train an agent using the A2C algorithm to solve the Atari Breakout game. So far I have succeeded to create that code with a single agent and environment. To break the ...

jgauth

asked May 11, 2020 at 15:41

Stack Exchange Network

All Questions

Advantage computed the wrong way?

Once the environments are vectorized, how do I have to gather immediate experiences for the agent?

Hot Network Questions

All Questions

Related Tags